Implement gMSA for Windows upstream kubernetes tests #2208

jsturtevant · 2022-03-30T18:06:21Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This added the need infrastructure and setup required to test gMSA for windows. See details in #1860.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1860

Special notes for your reviewer:

This requires one time set up in the subscription using the script provided scripts/gmsa/gmsa-setup.sh which configures some required managed identities and access to key-vault resources. This requires special privileges (Microsoft.Authorization/roleassignments/write) to configure access.

Other requirements:

image build with gmsa plugin Install Azure Key Vault gMSA plugin if configured image-builder#835 (satified with cncf-upstream:capi-windows:k8s-1dot23dot5-windows-2022-containerd:2022.03.29 and cncf-upstream:capi-windows:k8s-1dot23dot5-windows-2019-containerd:2022.03.30
kubernetes e2e test build from 1.24 with the patch Windows gmsa e2e: Don't assume bash is avaliable for webhook deployment kubernetes/kubernetes#108899

Initially tried to implement this with out introducing a new template but felt it would be closer to a customer experience if we had a new template. I left the commits for now but can re-base once we get through reviews.

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

squashed commits
includes documentation
adds unit tests

Release note:

NONE

k8s-ci-robot · 2022-03-30T18:06:23Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

jsturtevant · 2022-03-31T22:48:25Z

Set up in Azure sub is complete
/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

jsturtevant · 2022-04-01T02:46:19Z

serial slow test passed and the gMSA tests ran and passed 🥳

{"msg":"PASSED [sig-windows] [Feature:Windows] GMSA Full [Serial] [Slow] GMSA support works end to end"
{"msg":"PASSED [sig-windows] [Feature:Windows] GMSA Full [Serial] [Slow] GMSA support can read and write file to remote SMB folder"

jsturtevant · 2022-04-01T16:45:21Z

/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

jsturtevant · 2022-04-01T20:11:02Z

gMSA tests passed again.

/assign @CecileRobertMichon @marosset

marosset · 2022-04-01T21:55:13Z

It would be nice if we could figure out a way to not need new templates for to run the gmsa tests.

jsturtevant · 2022-04-01T22:14:33Z

It would be nice if we could figure out a way to not need new templates for to run the gmsa tests.

In the pr description I called this out. The first commit in the PR demonstrates how this would look without an additional template in the gmsa.go file.

I moved to the template approach for couple reasons:

it simplified some of the setup of the identities and vnet peering that is done in the after cluster creation section. Basically the only part that is done there now is the stuff required for the k/k tests nothing for gmsa to work.
it adds additional coverage to managed identities we don't cover in any other tests today
it is cluster to what a customer setup would look like

I could go either way but lean toward having the additional template for the above reasons.

jackfrancis · 2022-04-04T21:58:32Z

Not to throw a wrench in current progress, but can the gMSA install pre-reqs be implemented as a helm chart?

jsturtevant · 2022-04-04T22:51:34Z

Not to throw a wrench in current progress, but can the gMSA install pre-reqs be implemented as a helm chart?

The only component that needs to be installed for the implementation here is the ccg keyvault plugin which is install on the VM via image-builder. The webhook is installed at test run time so not needed here.

There is a WIP for a chart if a customer wanted to use it, otherwise it would need to be installed seperately. The ccg plugin is not a part of the chart.

jackfrancis · 2022-04-04T23:05:03Z

@jsturtevant sounds like wrapping all of the needful capz stuff into a helm chart for installation via test CI is not worth the effort...

docs/book/src/developers/development.md

scripts/gmsa/ci-gmsa.sh

templates/test/ci/prow-ci-version-gmsa/kustomization.yaml

test/e2e/gmsa.go

CecileRobertMichon · 2022-04-08T22:43:29Z

test/e2e/gmsa.go

+	keyVaultClient.Authorizer = keyvaultAuthorizer
+
+	// Wait for the Cluster nodes to be ready (this is different than capi's ready as cni needs to finish initializing)
+	windowsCalico := &appsv1.DaemonSet{


this test is assuming calico CNI, would we ever want it run with other CNIs? Would it be equivalent to check the nodes are all "Ready"?

I did this before I learned about: ControlPlaneWaiters: clusterctl.ControlPlaneWaiters{ Maybe that would be a better approach? It might help with some flakes that happen before the Windows pods are fully ready too.

CecileRobertMichon · 2022-04-08T22:49:22Z

scripts/gmsa/ci-gmsa.sh

+        GMSA_DOMAIN_ENVSUBST="${REPO_ROOT}/scripts/gmsa/domain.init"
+        GMSA_DOMAIN_FILE="${REPO_ROOT}/scripts/gmsa/domain.init.tmpl"
+        $ENVSUBST < "$GMSA_DOMAIN_FILE" > "$GMSA_DOMAIN_ENVSUBST"
+        az vm create -l "$AZURE_LOCATION" -g "$GMSA_NODE_RG" -n "$vmname" \


did you consider using the go sdk to run these prerequisites directly in the test suite instead of using the az cli in a script? similar to what we do for private cluster custom vnet setup

I did, having the creation of the domain outside the test suite made it so it could be used across test entry point scripts or by someone outside the project for their testing. This still requires a few additional set before being to run tests so maybe bringing it in to test suite would be fine now.

The other aspect of this, is that testing the domain creation was cumbersome and having it out side the test suite made it easier to iterate on without having to modify the rest of the tests to be skipped why working on it.

I listed a bunch ideas in #1860 (comment)

k8s-ci-robot · 2022-04-11T16:23:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from cecilerobertmichon after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

CecileRobertMichon · 2022-04-11T18:18:20Z

templates/test/ci/prow-ci-version-gmsa/patches/machine-identity.yaml

+    spec:
+      identity: UserAssigned
+      userAssignedIdentities:
+      - providerID: "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${CI_RG}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/cloud-provider-user-identity"


doesn't this require the fix in #2214 to work?

I had similar question but after discussing with @mboersma we figured out that since we are not testing any cloudprovider specific flows in this set of tests it doesn't exercise the code that would cause issues.

marosset · 2022-04-11T22:43:39Z

scripts/ci-conformance.sh

    "${REPO_ROOT}/hack/log/redact.sh" || true
 }

 trap cleanup EXIT

 if [[ "${WINDOWS}" == "true" ]]; then
+  if [[ $KUBETEST_WINDOWS_CONFIG =~ "windows-serial-slow" ]]; then


Is this bit documented anywhere?
I can image someone having a bad day trying to troubleshoot why GMSA tests aren't running correctly somewhere in PROW only to find this conditional...
Maybe we could at least add a log line like 'Skipping GMSA configuration' if we aren't performing the config to help debugging?

It isn't. We currently run the GMSA tests in the serial/slow jobs. There isn't a strict requirement for this and in fact the tests are pretty fast, just to initial set up of the cluster and domain is slow.

I used this because i didn't want to introduce yet another ENV but maybe it would be better to have it as additional setup? It would make your suggestion and debugging simpler.

scripts/gmsa/domain.init.tmpl

jsturtevant · 2022-04-27T22:13:46Z

Had discussion in slack and zoom to discuss the following in bit more in detail:

should we create a new CI template or do configuration purely in code (Implement gMSA for Windows upstream kubernetes tests #2208 (comment))
should the creation of the domain vm be done in the code of e2e.test vs script (Implement gMSA for Windows upstream kubernetes tests #2208 (comment))

Summary was:

using a CI template to do the work for vm identities, vnet peering and such is the best approach but was big discussion as to if this should live in the capz repo (including other e2e tests)
Along the same lines for this issue we discussed if the scripts to create the domain vms and do the set up should really live inside capz repo. General agree is this might not be the best place for them.

Also discussed how to make this as re-usable as possible for other providers but so much of the setup is azure specific there may not be much.

One possible idea was to move some of the testing templates and scripts to live in the sig-windows gmsa repo along with maybe using the outcome of az capi cli updates to create the clusters:

k8s-ci-robot · 2022-06-04T00:34:58Z

@jsturtevant: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jsturtevant · 2022-07-21T22:29:05Z

most of this work has been moved to kubernetes-sigs/windows-testing#328

/close

k8s-ci-robot · 2022-07-21T22:29:15Z

@jsturtevant: Closed this PR.

In response to this:

most of this work has been moved to kubernetes-sigs/windows-testing#328

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from mboersma and shysank March 30, 2022 18:06

jsturtevant force-pushed the gmsa branch 6 times, most recently from 4eb2c61 to 2f3c4cb Compare March 31, 2022 18:51

jsturtevant marked this pull request as ready for review April 1, 2022 02:46

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2022

k8s-ci-robot requested a review from devigned April 1, 2022 02:46

jsturtevant force-pushed the gmsa branch 3 times, most recently from c73ea82 to 5d9765f Compare April 1, 2022 16:33

k8s-ci-robot assigned CecileRobertMichon and marosset Apr 1, 2022

CecileRobertMichon reviewed Apr 8, 2022

View reviewed changes

CecileRobertMichon reviewed Apr 11, 2022

View reviewed changes

marosset reviewed Apr 11, 2022

View reviewed changes

scripts/gmsa/domain.init.tmpl Show resolved Hide resolved

Implement gMSA in conformance tests

56e8b44

jsturtevant force-pushed the gmsa branch from 3a0573d to 3bf45ae Compare April 27, 2022 17:52

Use a specific template for gMSA

e6c0a4b

jsturtevant force-pushed the gmsa branch from 3bf45ae to e6c0a4b Compare April 27, 2022 20:39

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2022

k8s-ci-robot closed this Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement gMSA for Windows upstream kubernetes tests #2208

Implement gMSA for Windows upstream kubernetes tests #2208

jsturtevant commented Mar 30, 2022

k8s-ci-robot commented Mar 30, 2022

jsturtevant commented Mar 31, 2022

jsturtevant commented Apr 1, 2022

jsturtevant commented Apr 1, 2022

jsturtevant commented Apr 1, 2022

marosset commented Apr 1, 2022

jsturtevant commented Apr 1, 2022

jackfrancis commented Apr 4, 2022

jsturtevant commented Apr 4, 2022

jackfrancis commented Apr 4, 2022

CecileRobertMichon Apr 8, 2022

jsturtevant Apr 11, 2022

CecileRobertMichon Apr 8, 2022

jsturtevant Apr 11, 2022

k8s-ci-robot commented Apr 11, 2022

CecileRobertMichon Apr 11, 2022

jsturtevant Apr 11, 2022

marosset Apr 11, 2022 •

edited

Loading

jsturtevant Apr 27, 2022

jsturtevant commented Apr 27, 2022 •

edited

Loading

k8s-ci-robot commented Jun 4, 2022

jsturtevant commented Jul 21, 2022

k8s-ci-robot commented Jul 21, 2022

Implement gMSA for Windows upstream kubernetes tests #2208

Implement gMSA for Windows upstream kubernetes tests #2208

Conversation

jsturtevant commented Mar 30, 2022

k8s-ci-robot commented Mar 30, 2022

jsturtevant commented Mar 31, 2022

jsturtevant commented Apr 1, 2022

jsturtevant commented Apr 1, 2022

jsturtevant commented Apr 1, 2022

marosset commented Apr 1, 2022

jsturtevant commented Apr 1, 2022

jackfrancis commented Apr 4, 2022

jsturtevant commented Apr 4, 2022

jackfrancis commented Apr 4, 2022

CecileRobertMichon Apr 8, 2022

Choose a reason for hiding this comment

jsturtevant Apr 11, 2022

Choose a reason for hiding this comment

CecileRobertMichon Apr 8, 2022

Choose a reason for hiding this comment

jsturtevant Apr 11, 2022

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 11, 2022

CecileRobertMichon Apr 11, 2022

Choose a reason for hiding this comment

jsturtevant Apr 11, 2022

Choose a reason for hiding this comment

marosset Apr 11, 2022 • edited Loading

Choose a reason for hiding this comment

jsturtevant Apr 27, 2022

Choose a reason for hiding this comment

jsturtevant commented Apr 27, 2022 • edited Loading

k8s-ci-robot commented Jun 4, 2022

jsturtevant commented Jul 21, 2022

k8s-ci-robot commented Jul 21, 2022

marosset Apr 11, 2022 •

edited

Loading

jsturtevant commented Apr 27, 2022 •

edited

Loading